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Abstract 

The use of annotations, referred to as assertions or contracts, to describe program properties 
for which run-time tests are to be generated, has become frequent in dynamic programing lan¬ 
guages. However, the frameworks proposed to support such run-time testing generally incur 
high time and/or space overheads over standard program execution. We present an approach 
for reducing this overhead that is based on the use of memoization to cache intermediate 
results of check evaluation, avoiding repeated checking of previously verified properties. Com¬ 
pared to approaches that reduce checking frequency, our proposal has the advantage of being 
exhaustive (i.e., all tests are checked at all points) while still being much more efficient than 
standard run-time checking. Compared to the limited previous work on memoization, it per¬ 
forms the task without requiring modifications to data structure representation or checking 
code. While the approach is general and system-independent, we present it for concreteness 
in the context of the Ciao run-time checking framework, which allows us to provide an op¬ 
erational semantics with checks and caching. We also report on a prototype implementation 
and provide some experimental results that support that using a relatively small cache leads 
to significant decreases in run-time checking overhead. To appear in Theory and Practice of 
Logic Programming (TPLP), Proceedings of ICLP 2015. 


1 Introduction 


The use of annotations to describe program properties for which run-time tests are 
to be generated has become frequent in dynamic programming languages, including 
assertion-based approaches in (Constraint) Logic Programming ((C)LP) (?; |Puebla 


et al. 19971 

Bueno et al. 1997| Boye et al. 1997 Hermenegildo et al. 19991 

Puebla 

et al. 2000b 

Lai 20001 

Hermenegildo et al. 20051 |Mera al. 2009|, soft/gradual typing 

in functional programming ( 

Cartwright and Fagan 1991 

Findler and Felleisen 2002 

Tobin-Hochstadt and Felleisen 2008 Dimoulas and Felleisen 2011 

1, and contract-based 


extensions in object-oriented programming (Lamport and Paulson 1999 Fahndrich and 


Logozzo 2011 Leavens et al. 20071. However, run-time testing in these frameworks 
can generally incur high penalty in execution time and/or space over the standard 
program execution without tests. A number of techniques have been proposed to date 
to reduce this overhead, including simplifying the checks at compile time via static 
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analysis (Puebla et al. 1997 Bueno et al. 1997 Hermenegildo et al. 19991 or reducing 
the frequency of checking, including for example testing only at a reduced number of 
points (Mera et al. 2009 Mera et al. 20111. 

Our objective is to develop an approach to run-time testing that is efficient while 
being minimally obtrusive and remaining exhaustive. We present an approach based on 
the use of memoization to cache intermediate results of check evaluation in order to 
avoid repeated checking of previously verified properties over the same data structure. 
Memoization has of course a long tradition in (C)LP in uses such as tabling resolu¬ 
tion (Tamaki and Sato 1986 Dietrich 1987 Warren 19921, including also sharing and 
memoizing tabled sub-goals (jZhou and Have 2012 1 , for improving termination. Memo¬ 


ization has also been used in program analysis (Warren et al. 1988 Muthukumar and 


Hermenegildo 19921, where tabling resolution is performed using abstract values. How¬ 


ever, in tabling and analysis what is tabled are call-success patterns and in our case the 
aim is to cache the results of test execution. 

While the approach that we propose is general and system-independent, we will 
present it for concreteness in the context of the Ciao run-time checking framework. The 
Ciao model (Hermenegildo et al. 1999 Puebla et al. 2000b Hermenegildo et al. 20051 
is well understood, and different aspects of it have been incorporated in popular (C)LP 


systems, such as Ciao, SWI, and XSB (Hermenegildo et al. 2012 Swift and Warren 


2012 Mera and Wielemaker 20131. Using this concrete model allows us to provide 


an operational semantics of programs with checks and caching, as well as a concrete 
implementation from which we derive experimental results. We also present a program 
transformation for implementing the run-time checks that is more efficient than previous 
proposals (Puebla et al. 2000b Mera et al. 2009 Mera et al. 20111. Our experimental 
results provide evidence that using a relatively small cache leads to significant decreases 
in run-time checking overhead. 


2 Preliminaries 

Basic notation and standard semantics. We recall some concepts and notation from 
standard (C)LP theory. An atom has the form p(ti,..., tn) where p is a predicate symbol 
of arity n and are terms. A constraint is a conjunction of expressions built 

from predefined predicates (such as term equations or inequalities over the reals) whose 
arguments are constructed using predefined functions (such as real addition). A literal 
is either an atom or a constraint. A goal is a finite sequence of literals. A rule is of the 
form H :-B where H, the head, is an atom and B, the body, is a possibly empty finite 
sequence of literals. A constraint logic program, or program, is a finite set of rules. 

The definition of an atom A in a program, defn(A), is the set of variable renamings 
of the program rules s.t. each renaming has A as a head and has distinct new local vari¬ 
ables. We assume that all rule heads are normalized, i.e., H is of the form p{Xi ,..., A„) 
where the Xi ,..., A„ are distinct free variables. Let 3^0 be the constraint 0 restricted 
to the variables of the syntactic object L. We denote constraint entailment by so 
that 6*1 1= 6*2 denotes that 6 *i entails 02 - Then, we say that 02 is weaker than 0i. 

The operational semantics of a program is given in terms of its “derivations”, which 
are sequences of “reductions” between “states”. A state {G \ 0) consists of a goal G and 
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a constraint store (or store for short) 0. We use :: to denote concatenation of sequences 
and we assume for simplicity that the underlying constraint solver is complete. A state 
S = {L G \ 9) where L is a literal can be reduced to a state S' as follows: 

1. (L :: G I 9) (G I 0 A A) if L is a constraint and 9 A L is satisfiable. 

2. (L :: G I 9) {B :: G \ 9) if L is an atom of the form p{ti,..., t„), 
for some rule {L:-B) G defn(L). 

We use S S' to indicate that a reduction can be applied to state S to obtain state 
S'. Also, S S' indicates that there is a sequence of reduction steps from state S 
to state S'. A query is a pair {L,9), where L is a literal and 9 a store, for which the 
(C)LP system starts a computation from state {L \ 9). A finished derivation from a 
query {L,9) is successful if the last state is of the form (□ | 9'), where □ denotes the 

empty goal sequence. In that case, the constraint 3l9' is an answer to S. We denote 

by answers((3) the set of answers to a query Q. 


pred-Assertions and their Semantics. Assertions are linguistic constructions for ex¬ 
pressing properties of programs. Herein, we will use the pred-assertions of (Hermenegildc 


et al. 1999 


Puebla et al. 2000a Puebla et al. 2000bI, for which we follow the formal¬ 


ization of (Stulova et al. 20141. These assertions allow specifying certain conditions on 


the constraint store that must hold at certain points of program derivations. In partic¬ 
ular, they allow stating sets of preconditions and conditional postconditions for a given 
predicate. A set of assertions for a predicate is of the form: 

pred Head : Prei => Posti. 


pred Head : Pren => Post„. 

where Head is a normalized atom that denotes the predicate that the assertions apply 
to, and the PrCi and Posti are (DNF) formulas that refer to the variables of Head. We 
assume the PrCi and Posti to be DNF formulas of prop literals, which specify conditions 
on the constraint store. A prop literal L succeeds trivially for 9 in program P, denoted 
9 L, iff 39' G answers((L, 9)) such that 9 ^ 9'. 

A set of assertions as above states that in any execution state {Head :: G | 0) at least 
one of the Prci conditions should hold, and that, given the {Prci, Posti) pair(s) where 
Prci holds, then, if Head succeeds, the corresponding Posti should hold upon success. 
More formally, given a predicate represented by a normalized atom Head, and the corre¬ 
sponding set of assertions is A = {Ai ... A„}, with Ai = pred Head : Pret => 
Posti. ” such assertions are normalized into a set of assertion conditions {Gq, Gi, ..., G„}, 
with: ^ ( calls(Head, \/"=i i = 0 

* \ success{Head, PrCi, Posti) i = l..n 

If there are no assertions associated with Head then the corresponding set of condi¬ 
tions is empty. The set of assertion conditions for a program is the union of the assertion 
conditions for each of the predicates in the program. 

The ca\\s{Head,...) conditions encode the checks that ensure that the calls to the 
predicate represented by Head are within those admissible by the set of assertions, 
and we thus call them the calls assertion conditions. The success{Headi, PrCi, Posti) 
conditions encode the checks for compliance of the successes for particular sets of calls, 
and we thus call them the success assertion conditions. 
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We now turn to the operational semantics with assertions, which checks whether as¬ 
sertion conditions hold or not while computing the derivations from a query. In order 
to keep track of any violated assertion conditions, we add labels to the assertion condi¬ 
tions. Given the atom La and the corresponding set of assertion conditions Ac, ^^{La) 
denotes the set of labeled assertion condition instances for La, where each is of the form 
c4j=Ca, such that 3C G Ac, C = calls(L, Pre) (or C = success(L, Pre, Post)), tr is a re¬ 
naming s.t. L = a{La), Ca = calls(LQ, cr(Pre)) (or Ca = success{La, cr{Pre), a{Post))), 
and c is an identifier that is unique for each Ca ■ We also introduce an extended program 
state of the form {G \ 9 \ S), where £ denotes the set of identifiers for falsified assertion 
condition instances. For the sake of readability, we write labels in negated form when 
they appear in the error set. We also extend the set of literals with syntactic objects of 
the form check(L, c) where L is a literal and c is an identifier for an assertion condition 
instance, which we call check literals. Thus, a literal is now a constraint, an atom or a 
check literal. We can now recall the notion of Reductions in Programs with Assertions 
from (Stulova et al. 2014|, which is our starting point: a state S = {L :: G \ 6 \ £), 
where L is a literal, can be reduced to a state S', denoted S S', as follows: 

1. If i is a constraint or L = X{ti,, tn), then S' = {G' \ O' \ £) where G' and O' 

are obtained in a same manner as in (L :: G | 6*) {G' \ O') 

2. If L is an atom and 3{L\-B) G defn(L), then S' = {B ■.■. G' \ 0 \ £') where: 

£ U {c} if 3 c#calls(L, Pre) G Af;{L) s.t. 0 Pre 
£ otherwise 


£' = 


and G' = check(L, ci) check(L, c„) :: G such that 

Ciil^success{L, Prci, Posti) G Af){L) A O^pPra. 

3. If L is a check literal check(L',c), then S" = (G | 0 | £') where 

— I c#success(L', Post) G J^{L') AO ^p Post 

\ £ otherwise 


3 Run-time Checking with Caching 

The standard operational semantics with run-time checking revisited in the previous 
section has the same potential problems as other approaches which perform exhaustive 
tests: it can be prohibitively expensive, both in terms of time and memory overhead. 
For example, checking that the first argument of the length/2 predicate is a list at 
each recursive step turns the standard 0{n) algorithm into 0(ji^). 

As mentioned in the introduction, our objective is to develop an effective solution 
to this problem based on memoizing property checks. An observation that works in 
our favor is that many of the properties of interest in the checking process (such as, 
e.g., regtype instantiation checking) are monotonic. That is, we will concentrate on 
properties such that, for all property checks L if (_ | 0) (_ | O') and 0 ^p L then 

O' ^P L. In this context it clearly seems attractive to keep L in the store so that it does 
not need to be recomputed. However, memoizing every checked property may also have 
prohibitive costs in terms of memory overhead. A worst-case scenario would multiply 
the memory needs by the number of call patterns to properties, which can be large in 
realistic programs. In addition, looking for stored results in the store obviously also has 
a cost that must be taken into account. 
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Operational Semantics with Caching. We base our approach on an operational seman¬ 
tics which modifies the run-time checking to maintain and use a cache store. The cache 
store M is a special constraint store which temporarily holds results from the evalua¬ 
tion of prop literals w.r.t. the standard constraint store 9. We introduce an extended 
program state of the form (G | 0 | M | £) and a cached version of “succeeds trivially’’'’'. 
given a prop literal L, it succeeds trivially for 9 and M in program P, denoted 9 L, 
iff either L S M or 0 L. Also, the cache store is updated based on the results of the 
prop checks, formalized in the following definitions: 

Definition 1 (Updates on the Cache Store) 

Let us consider a DNF formula Props = V"=i(AjI^o where each is a prop 
literal. By lits(Props) = {Lij\i S [1 : n],j S [0 : rn{i)]} we denote the set of all 
literals which appear in Props. The cache update operation is defined as a function 
upd(0,M, Props) such that: 

upd(0,M, Props) C M U {L\{9 ^p P) A (P ^ M) A (P G lits(Props)} 

Note that a precise definition of cache update is left open in this semantics. Con¬ 
trary to 9, updates to the cache store M are not monotonic since we allow the cache 
to “forget” information as it fills up, i.e., we assume from the start that M is of lim¬ 
ited capacity. However, that information can always be recovered via recomputation 
of property checks. In practice the exact cache behavior depends on parts of the low- 
level abstract machine state that are not available at this abstraction level. It will be 
described in detail in later sections. 

Definition 2 {Reductions with Assertions and Cache Store) 

A state S' = (P :: G I 0 I M I £), where L is a literal, can be reduced to a state S', 
denoted S S', as follows: 

1. If P is a constraint or L — X{ti,..., t„), then S' = (G' | 0' | M | S) where G' and 

9' are obtained in a same manner as in (P :: G | 0) (G' | 9') 

2. If P is an atom and 3{L:-B) G defn(P), then 
S' = (P :: G' I 6» I M' I £') where: 

£/_ J {c}U£ if 3 c#calls(P, Pre) G Ap(P) s.t. 0 ^P Pre 
{ £ otherwise 

M' = upd(0, M, Pre) and G' = check(P, ci) check(P, c„) :: G such that 

Ciffsuccess{L, Prei, Posti) G Aq{L) A 9 Pra. 

3. If P is a check literal check(P',c), then S' = (G | 0 | M' | £') where 

£'= J {c} U S if cffsuccess{L', _, Post) G A*{L') A 9 Post 
{ £ otherwise 

and M' = upd(0,M,Post). 


4 Implementation of Run-time Checking with Caching 


We use the traditional definitional transformation ( Puebla et al. 2000b| as a basis of 
our implementation of the operational semantics with cached checks. This consists of a 
program transformation that introduces wrapper predicates that check calls and success 
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assertion conditions while running on a standard (C)LP system. However, we propose 
a novel transformation that, in contrast to previous approaches, groups all assertion 
conditions for the same predicate together to produce optimized checks. 

Given a program V, for every predicate p the transformation replaces all clauses 
p{x) ^ body by p'{x) ^ body, where p' is a new predicate symbol, and inserts the 
wrapper clauses given by wrap(p(a;),p'). The wrapper generator is defined as follows: 

{ p{x) pc{x,f),p'{x),ps{x,f). 

Pc{x,f) ChecksC. 

Ps{x,r) ChecksS. 

where ChecksC and ChecksS are the optimized compilation of pre- and postcondi¬ 
tions y^^iPrCi and f^^iiPrCi — > Posti) respectively, for co#calls(p(a:), \/"=i 
Ci#success(p(a:), Prci, Posti) G Aq(p(x)); and the additional status variables r are used 
to communicate the results of each PrCi evaluation to the corresponding {PrCi —> Posti) 
check. This way, without any modifications to the literals calling p in the bodies of 
clauses in V (and in any other modules that contain calls to p), after the transforma¬ 
tion run-time checks will be performed for all these calls to p since p (now p') will be 
accessed via the wrapper predicate. 

The compilation of checks for assertion conditions emits a series of calls to a 
reify_check(P ,R) predicate, which accepts as the first argument a property and unifies 
its second argument with 1 or 0, depending on whether the property check succeeded 
or not. The results of those reified checks are then combined and evaluated as boolean 
algebra expressions using bitwise operations and the Prolog is/2 predicate. That is, 
the logical operators (A V H), {A /\ B), and (A —> B) used in encoding assertion con¬ 
ditions are replaced by their bitwise logic counterparts R is A \/ B, R is A /\ B, 
R is (A # 1) \/ B, respectively. 

The purpose of reification and this compilation scheme is to make it possible to opti¬ 
mize the logic formulae containing properties that result from the combination of several 
pred assertions (i.e., the assertion conditions). The optimization consists in reusing the 
reified status R when possible, which happens in two ways. First, the prop literals which 
appear in Pre or Post formulas are only checked once (via reify_check/2) and then 
their reified status R is reused when needed. Second, the reified status of each Pre 
conjunction is reused both in ChecksC and ChecksS. 

In practice the wrap(p(a;),p') clause generator shares the minimum number of status 
variables and omits trivial assertion conditions, i.e., those with true conditions in one 
of their parts. For instance, excluding psix, r) preserves low-level optimizations such as 
last call optimization Q 


Example 1 {Program transformation) 
Consider the following annotated program: 


- pred p(X,Y) 

(int(X) 

var(Y)) => (int(X), 

int(Y)) . 

7 . A1 

- pred p(X,Y) 

(int(X) 

var(Y)) => (int(X), 

atm(Y)). 

•/. A2 

- pred p(X,Y) 

(atm(X) 

var(Y)) => (atm(X), 

atm(Y)). 

•/. A3 


^ Even though in this work the pc{x, r) and ps{x, r) predicates follow the usual bytecode-based com¬ 
pilation path, note that they have a concrete structure that is amenable to further optimizations 
(like specialized WAM-level instructions or a dedicated interpreter). 
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p(l,42). p(2,gamma). p(a,alpha). 


From the set of assertions {Al, A2, A3} the following assertion conditions are con¬ 


structed: 


Co = calls(p(X, y), {int{X) A var{Y)) V {{atm{X) A var{Y)))) 
Cl = success(p(X, y), {int{X) A var{Y)), {int{X) A int{Y))) 

C 2 = success(p(X, y), {int(X) A var(Y)), (int(X) A atm(Y))) 
C 3 = success(p(X, y), {atm{X) A var(Y)), (atm(X) A atm{Y))) 


The resulting optimized program transformation is: 
p(X,Y) 

p_c(X,Y,R3,R4), 

P’(X,Y), 
p_s(X,Y,R3,R4). 


p_c(X,Y,R3,R4) 

reify_check(atm(X),R0), 
reify_check(int(X),R1). 
reify_check(var(Y),R2), 
R3 is R1/\R2, 

R4 is R0/\R2, 

Rc is R3\/R4, 
error_if_false(Rc). 


s(X,Y,R3,R4) 

reify_check(atm(X),R5), 
reify_check(int(X),R 6 ), 
reify_check(atm(Y),RX), 
reify_check(int(Y),R 8 ), 

Rs is (R3#1\/(R6/\R8)) 

/\ (R3#1\/(R6/\R7)) 

/\ (R4#1\/(R5/\R7)), 
error_if_false(Rs). 

p’(l,42). p’(2,gamma), p’(a,alpha). 


Please note that A1 and A2 have identical preconditions, and this is reflected in having 
only one property combination, R3, for both of them. The same works for individual 
properties: in Cq literal int{X) appears twice, literal var(Y) three times, but all such 
occurrences correspond to only one check in the code respectively. 


The error-reporting predicates error_if _false/1 in the instrumented code imple¬ 
ment the £ update in the operational semantics. These predicates abstract away the 
details of whether errors produce exceptions, are reported to the user, or are simply 
recorded. 

The cache itself is accessed fundamentally within the reify_check/2 predicate. Al¬ 
though the concrete details for a particular use case (and a corresponding set of ex¬ 
periments) will be described later, we discuss the main issues and trade-offs involved 
in cache implementation in this context. First, although the cache will in general be 
software-defined and dynamically allocated, in any case the aim is to keep it small with 
a bounded limit (typically a fraction of the stacks), so that it does not have a significant 
impact on the memory consumption of the program. 

Also, in order to ensure efficient lookups and insertions of the cache elements, it may 
be advantageous not to store the property calls literally but rather their memory repre¬ 
sentation. This means however that, e.g., for structure-copying term representation, a 
property may appear more than once in the cache for the same term if its representation 
appears several times in memory. 

Furthermore, insertion and removal (eviction) of entries can be optimized using 
heuristics based on the cost of checks (e.g., not caching simple checks like integer/1), 
the entry index number (such as direct-mapped), the history of entry accesses (such as 
LRU or least-recently used), or caching contexts (such as caching depth limits during 
term traversal in regular type checks). 
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Finally, failure and some of the stack maintenance operations such as reallocations for 
stack overflows, garbage collection, or backtracking need updates on the cache entries 
(due to invalidation or pointer reallocation). Whether it is more optimal to evict some 
or all entries, or update them is a nontrivial decision that defines another dimension in 
heuristics. 


5 Application to Regular Type Checking 

As concrete properties to be used in our experiments we select a simple yet useful subset 


of the properties than can be used in assertions: the regular types (Dart and Zobel 19921 


often used in (C)LP systems. Regular types are properties whose definitions are regular 
programs, defined by a set of clauses, each of the form: “p{x, vi,..., Vn) ■ - Bi,..., Bk” 
where x is a linear term (whose variables, which are called term variables, are unique); 
the terms x of different clauses do not unify; vi, ..., Vn are unique variables, which 
are called parametric variables; and each Bi is either t(z) (where 0 is one of the term 
variables and t is a regular type expression) or q{y,ti,... ,tm) (where q/{m + 1) is a 
regular type, ti,... ,tm are regular type expressions, and y is a. term variable). A regular 
type expression is either a parametric variable or a parametric type functor applied to 
some of the parametric variables. A parametric type functor is a regular type, defined 
by a regular program. 


Instantiation checks. Intuitively, a prop literal L succeeds trivially if L succeeds for 9 
without adding new “relevant” constraints to 9 (Hermenegildo et al. 1999 Puebla et al. 


2000a I ^ A standard technique to check membership on regular types is based on tree 
automata. In particular, the regular types defined above are recognizable by top-down 
deterministic automata. 

This also includes parametric regtypes, provided their parameters are instantiated 
with concrete types during checking, since then they can be reduced to non-parametric 
regtypes. 

Let us recall some basics on deterministic tree automata, as they will be the basis of 
our regtype checking algorithm. A tree automaton is a tuple A — {T,,Q, A,Q f) where 
E, Q, A, Qf are finite sets such that: E is a signature, Q is a finite set of states, A 
is the set of transitions of the form f{qi,..., qn) —^ q where / S E, g, gi,..., G Q 
with n being the arity of /, and Qf C Q is the set of final states. The automaton is 
top-down deterministic if jQ/j = 1 and for all / G E and all g G Q there exists at most 
one sequence gi,..., g„ such that /(gi,..., g„) —>■ g G A. 

Translation of regular types (or instances of parametric regular types for particular 
types) from Prolog clauses into deterministic top-down tree automata rules is straight¬ 
forward. This representation is suitable for low-level encoding (e.g., using integers for 
qi states and a map between each g^ state and its definition). 


^ Note that checks are performed via entailment checks w.r.t. primitive (Herbrand) constraints. That 
means that term{X) (which is always true) and ground(X) (denoting all possible ground terms), 
despite having the same minimal Herbrand models as predicates, do not have the same s-model and 
are not interchangeable as regtype instantiation checks. 
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Algorithm 1 Check that the type of the term stored at x is t, at depth d. 
function RegCheck(2;, t, d) 

Find C € Constructors{t) so that Functor{C) = Functor{x), 
otherwise return False 

if Arityix) = 0 then [> Atomic value, not cached 

return True 

else if CACHELoOKUP(a;, t) then [> Already in cache 

return True 

else if Vi G [1, Aritj/( 2 ;)].REGCHECK(Ar(/(i, x),Arg(i, C),d + 1) then 

if d < depthLimit then l> Insert in cache 

CAGHElNSERT(a:, t) 

return True > In regtype 

else 

return False > Not in regtype 


Example 2 

The following bintree/2 regular type describes a binary tree of elements of type T. 
The corresponding translation into tree automata rules for the bintree(int) instance 


with Qf = {gb} is shown to its right, 
regtype bintree/2. 



bintree(empty,!). 

A = { empty 

“>■ qb 

bintree(tree(LC,X,RC),T) 

bintree(LC,T),T(X).bintree(RC,T). 

tree{qb,qint,qb) 

—>■ Qb } 


Algorithm for Checking Regular Types with Caches. We describe the RegCheck algo¬ 
rithm for regtype checking using caches in Algorithm The reify_check/2 predicate 
acts as the interface between RegCheck and the runtime checking framework. The 
algorithm is derived from the standard definition of run on tree automata. A run of 
a tree automaton A = (S,(5, A,Q/) on a tree x G Ts (terms over E) is a mapping p 
assigning a state to each occurrence (subterm) of f{xi, ..., x„) of x such that: 

f{p{xi), . . . , p{Xn)) . . . , Xn)) G A 

A term x is recognized by A if p{x) G Qf. For deterministic top-down recognition, 
the algorithm starts with the single state in Q/ (which for simplicity, we will use 
to identify each regtype and its corresponding automata) and follows the rules back¬ 
wards. The tree automata transition rules for a regtype are consulted with the functions 
Constructors{t) = {C\C —> t G A}, Arg{i,u) (the i-th argument of a constructor or 
term u), and Functor{u) (the functor symbol, including arity, of a constructor or term 
u). Once there is a functor match, the regtypes of the arguments are checked recursively. 
To speed up checks, the cache is consulted (CacheLookup(x, t) searches for {x,t)) be¬ 
fore performing costly recursion, and valid checks inserted (CacheInsert(x, t) inserts 
(x, t)) if needed (e.g., using heuristics, explained below). The cache for storing results of 
regular type checking is implemented as a set data structure that can efficiently insert 
and look up (x, t) pairs, where x is a term address]^ and t a regular type identifier. The 
specific implementation depends on the cache heuristics, as described below. 

® Since regtype checks are monotonic, this is safe as long as cache entries are properly invalidated on 
backtracking, stack movements, and garbage collection. Using addresses is a pragmatic decision to 
minimize the overheads of caching. 
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Complexity. It is easy to show that complexity has 0(1) best case (if x was cached) 
and 0(n) worst case, with n being the number of tree nodes (or term size). In practice, 
the caching heuristics can drastically affect performance. For example, assume a full 
binary tree of n nodes. Caching all nodes at levels multiple of c will need n/(2'^+^ — 1) 
entries, with a constant cost for the worst case check (at most 2'^+^ — 1 will be checked, 
independently of the size of the term). 

Cache Implementation and Heuristics. In order to decide what entries are added and 
what entries are evicted to make room for new entries on cache misses, we have im¬ 
plemented several caching heuristics and their corresponding data structures. Entry 
eviction is controlled by replacement policies: 

• Least-recently used (LRU) replacement and fully associative. Implemented as a 
hash table whose entries are nodes of a doubly linked list. The most recently 
accessed element is moved to the head and new elements are also added to the 
head. If cache size exceeds the maximal size allowed, the cache is pruned. 

• Direct-mapped cache with collision replacement, with a simple hash function 
based on modular arithmetic on the term address. This is simpler but less pre¬ 
dictable. 

The insertion of new entries is controlled by the caching contexts, which include the 
regular type being checked and the location of the check: 

• We do not cache simple properties (like primitive type tests, e.g., integer/1, etc), 
where caching is more expensive than recomputing. 

• We use the check depth level in the cache interface for recursive regular types. 
Checks beyond this threshold depth limit are not cached. This gives priority to 
roots of data structures over internal subterms which may pollute the cache. 

Low-level C implementation. In our prototype, this algorithm is implemented in C with 
some specialized cases (as required for our WAM-based representation of terms, e.g., to 
deal with atomic terms, list constructors, etc.) The regtype definition is encoded as a 
map between functors (name and arity) and an array of q states for each argument. For 
a small number of functors, the map is implemented as an array. Efficient lookup for 
many functors is achieved using hash maps. Additionally, a number of implicit transition 
rules exist for primitive types (any term to Pany, integers to qmt, etc.) that are handled 
as special cases. 


6 Experimental Results and Evaluation 

To study the impact of caching on run-time overhead, we have evaluated the run¬ 
time checking framework on a set of 7 benchmarks, for regular types. We consider 
benchmarks where we perform a series of element insertions in a data structure. Bench¬ 
marks amqueue, set, B-tree, and (binary) tree were adapted from the Ciao libraries; 

^ Even though the algorithm can be easily implemented as a deterministic Prolog program, we chose in 
this work a specialized, lower-level implementation that can interact more directly with the optimized 
cache data structures. 
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benchmarks AVl-tree, RB-tree and heap were adapted from the YAP libraries. These 
benchmarks can be divided into 4 groups: 

(a) simple list-based data structures: amqueue, set; 

(b) balanced tree-based structures that do not change the structural properties of their 
nodes on balancing: AVl-tree, heap; 

(c) balanced tree-based structures that change node properties: B-tree (changes the 
number of node children), RB-tree (changes node color); 

(d) unbalanced tree structures (tree). 

For each run of the benchmark suite the following parameters were varied: cache 
replacement policy (LRU, direct mapping), cache size (1 to 256 cells), and check depth 
threshold (1 to 5, and “infinite” threshold for unlimited check depth). Tablesumma¬ 
rizes the results of the experiments. For each combination of the parameters it reports 
the optimal caching policy, LRU (L) or direct mapping (D). Also, for each of the bench¬ 
marks it reports an interval within which the worst case check depth varies. 

The experiments show that the overhead of checks with depth threshold 2 (storing 
the regtype of the check argument and the regtypes of its arguments) is smaller than or 
equal to the one obtained with unlimited depth limit (FigQ. A depth limit of 1 does 
not allow checks to store enough useful information about terms of most of the data 
structures (compare the overhead increase for amqueue with this and bigger limits), while 
unlimited checks tend to overwrite this information multiple times, so that it cannot 
be reused. At the same time, for data structures represented by large nested terms 
(e.g., nodes of B-trees), deeper limits (3 or 4) for small inputs seem more beneficial for 
capturing such term structure. It can also be observed that the lower cost of element 
insert/lookup operations with the DM cache replacement policy results in having lower 
total overhead than with the LRU policy. 

While even with caching the cost of the run-time checks still remains significant 
caching does reduce overhead by 1-2 orders of magnitude with respect to the cost of 
run-time checking without caching (Fig.[^. Also, the slowdown ratio of programs with 
run-time checks using caching is almost constant, in contrast with the linear (or worse) 

® Note that in general run-time checking is a technique for which non-trivial overhead can be expected 
for all but the most trivial properties. It can be conceptually associated with running the program 
in the debugger, which typically also introduces significant cost. 
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cache size = 256. check depth limit = 1 



cache size = 256. check depth limit = 2 



cache size = 256. check depth limit = inf 



cache size = 256, check depth limit = 1 


cache size = 256. check depth limit = 2 


cache size = 256. check depth limit = inf 





Fig. 1: Run-time check overhead ratios for all benchmarks with check depth thresholds 
of 1, 2, oo, and DM (top row) and LRU (bottom row) policies in cache of 256 elements. 


test = min-heap, cache size = 256, 
regtype check depth = 2 


test = min-heap, cache size = 256, 
regtype check depth = 2 




Number of insert operations 


Number of insert operations 


Fig. 2: Absolute and relative running times of the heap benchmark with different rtchecks 
configurations, LRU caching policy. 


growth in the case where caching is not used. An important issue that has to be taken 
into account here is that most of the benchmarks are rather simple, and that performing 
insert operations is much less costly that performing run-time checks on the arguments 
of this operation. This explains the observation that checking overhead is the highest 
for the set benchmark (Fig[^, while it is one of the simplest used in the experiments. 

Another factor that affects the overhead ratio is cache size. For smaller caches cell 
rewritings occur more often, and thus the optimal cache replacement policy in such cases 
is the one with the cheapest operations. For instance, for cache size 32 the optimal policy 
for all benchmark groups is DM, while for other cache sizes LRU is in some cases better 
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test = min-heap, check depth limit = 2 
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Fig. 3: Worst case regtype check depth for benchmarks from groups (b) and (c), with 
LRU and DM cache replacement policies respectively. 


as it allows optimizing cell rewritings. This observation is also confirmed by the maximal 
check depth in the worst case, which is almost half on average for the benchmarks for 
which LRU is the optimal policy (Fig|^. In the simple data structures of group (a) 
the experiments show that it is beneficial to have cheaper cache operations (like those 
of caches with DM caching policy), since such structures do not suffer from cache cell 
rewritings as much as more complex structures. The same observation is still true for 
group (d), where for some inputs the binary tree might grow high and regtype checks 
of leaves will pollute the cache with results of checks for those inner nodes on the path, 
that are not in the cache, overwriting cache entries with regtypes of previously checked 
nodes. The DM policy also happens to show better results for group (c) for a similar 
reason. Since data structures in this group change essential node properties during the 
tree insertion operation, this in practice means that sub-terms that represent inner 
tree nodes are (re-)created more often. As a result, with the LRU caching policy the 
cache would become populated by check results for these recently created nodes, while 
the DM caching policy would allow preserving (and reusing) some of the previously 
obtained results. The only group that benefits from LRU is (b), where this policy helps 
preserving check results for the tree nodes that are closer to the root (and are more 
frequently accessed) and most of the overwrites happen to cells that store leaves. 

More plots are available in the online appendix (Appendix A). 


7 Conclusions and Related Work 

We have presented an approach to reducing the overhead implied by run-time checking 
of properties based on the use of memoization to cache intermediate results of check eval¬ 
uation, avoiding repeated checking of previously verified properties. We have provided 
an operational semantics with assertion checks and caching and an implementation ap¬ 
proach, including a more efficient program transformation than in previous proposals. 
We have also reported on a prototype implementation and provided experimental re¬ 
sults that support that using a relatively small cache leads to very significant decreases 
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in run-time checking overhead. The idea of using memoization techniques to speed 
up checks has attracted some attention recently (Koukoutos and Kuncak 20141. Their 
work (developed independently from ours) is based on adding fields to data structures 
to store the properties that have been checked already for such structures. In contrast, 
our approach has the advantage of not requiring any modifications to data structure 
representation, or to the checking code, program, or core run-time system. Compared 
to the approaches that reduce checking frequency our proposal has the advantage of 
being exhaustive (i.e., all tests are checked at all points) while still being much more 
efficient than standard run-time checking. Our approach greatly reduces the overhead 
when tests are being performed, while allowing the parts for which testing is turned 
off to execute at full speed without requiring recompilation. While presented for con¬ 
creteness in the context of the Ciao run-time checking framework, we argue that the 
approach is general, and the results should carry over to other programming paradigms. 


Acknowledgments: Research supported in part by projects EU FP7 318337 EN- 
TRA, Spanish MINECO TIN2012-39391 StrongSoft, and Madrid Regional Government 
S2013/ICE-2731, N-Greens Software. 
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Appendix A 

This appendix includes plots of the run-time checking overhead observed in the set 
of 7 benchmarks for different cache replacement policies. There are four groups of 
plots: 

• overhead ratio plots, where overhead ratio curves are grouped by cache size 
and check depth limit (Figures At and A5); 

• overhead ratio plots, where overhead ratio curves are grouped by benchmark 
and check depth limit (Figures A 2 and A 6); 

• maximal regtype check depth reached plots, where check depth curves are 
grouped by benchmark and cache size (Figures A3 and A7); 

• absolute and relative benchmark execution time plots for benchmarks without 
rtchecks, with rtchecks and with both rtchecks and caching (Figures A 4 and A 8). 
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A.l “Least Recently Used” Cache Replacement Policy 
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Fig. Al: Overhead ratios for all benchmarks, check depth limits 1, 2 and oo, LRU 
caching policy. 
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lest = atnqueue, check depth limit = 1 



Number of insert operations 


test = set, check depth limit = 1 
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test = AVL-tree, check depth limit = 1 
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test = min-heap. check depth limit = 1 
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lest = amqueue, check depth limit = 2 
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test = amqueue, check depth limit = inf 
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Fig. A 2: Overhead ratios for each benchmark, check depth limits 1, 2 and oo, LRU 
caching policy. 
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test = RB-tree, check depth limit = 1 


test = RB-tree, check depth limit = 2 


test = RB-tree, check depth limit = inf 



test = 2-3-4 B-tree, check depth limit = 1 


test = 2-3-4 B-tree, check depth limit = 2 


test = 2-3-4 B-tree, check depth limit = inf 



Fig. A 2: Overhead ratios for each benchmark, check depth limits 1, 2 and oo, LRU 
caching policy. 
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test = amqueue, check depth limit = I 
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Fig. A3: Max regtype check depth for each benchmark, check depth limits 1, 2 and 
oo, LRU caching policy. 
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Fig. A3: Max regtype check depth for each benchmark, check depth limits 1, 2 and 
oo, LRU caching policy. 
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test = amqueue, cache size = 256, 
regtype check depth = 2 
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Fig. A 4: Absolute and relative benchmark running times, cache size 256 elements, 
check depth limit 2, LRU caching policy. 
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test = RB-tree, cache size = 256, 
regtype check depth = 2 
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Fig. A 4: Absolute and relative benchmark running times, cache size 256 elements, 
check depth limit 2, LRU caching policy. 
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A.2 Direct Mapping Cache Replacement Policy 
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Fig. A 5: Overhead ratios for all benchmarks, check depth limits 1, 2 and oo, DM 
caching policy. 
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Fig. A 6: Overhead ratios for each benchmark, check depth limits 1, 2 and oo, DM 
caching policy. 
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Fig. A 6: Overhead ratios for each benchmark, check depth limits 1, 2 and oo, DM 
caching policy. 
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Fig. A 7: Max regtype check depth for each benchmark, check depth limits 1, 2 and 
oo, DM caching policy. 
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Fig. A 7: Max regtype check depth for each benchmark, check depth limits 1, 2 and 
oo, DM caching policy. 
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Fig. A 8: Absolute and relative benchmark running times, cache size 256 elements, 
check depth limit 2, DM caching policy. 
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Fig. A 8: Absolute and relative benchmark running times, cache size 256 elements, 
check depth limit 2, DM caching policy. 











































































